An Automated Text Document Classification Framework using BERT
نویسندگان
چکیده
Due to the rapid advancement of technology, volume online text data from numerous various disciplines is increasing significantly over time. Therefore, more work needed create systems that can effectively classify in accordance with its content, facilitating processing and extraction crucial information. Since these non-automated use manual feature classification, which error-prone time-consuming by choosing best appropriate algorithms for traditional procedures are typically resource intensive (computational, human, etc.), not a viable solution. To address shortcomings approaches, we offer unique categorization strategy based on well-known DL algorithm called BERT. The proposed framework trained tested using cutting-edge datasets, such as UCI email dataset, includes spam non-spam emails, BBC News multiple categories tech, sports, politics, business, entertainment. system achieved highest accuracy 91.4% be used different organizations text-based high performance. effectiveness evaluated evaluation metrics Accuracy, Precision, Recall.
منابع مشابه
Enhanced Information Retrieval from Narrative German-language Clinical Text Documents using Automated Document Classification
The amount of narrative clinical text documents stored in Electronic Patient Records (EPR) of Hospital Information Systems is increasing. Physicians spend a lot of time finding relevant patient-related information for medical decision making in these clinical text documents. Thus, efficient and topical retrieval of relevant patient-related information is an important task in an EPR system. This...
متن کاملFeature Selection Technique for Text Document Classification: An Alternative Approach
Text classification and feature selection plays an important role for correctly identifying the documents into particular category, due to the explosive growth of the textual information from the electronic digital documents as well as world wide web. In the text mining present challenge is to select important or relevant feature from large and vast amount of features in the data set. The aim o...
متن کاملText Document Classification: an Approach Based on Indexing
In this paper we propose a new method of classifying text documents. Unlike conventional vector space models, the proposed method preserves the sequence of term occurrence in a document. The term sequence is effectively preserved with the help of a novel datastructure called ‘Status Matrix’. Further the corresponding classification technique has been proposed for efficient classification of tex...
متن کاملText classification with sparse composite document vectors
In this work, we present a modified feature formation technique gradedweighted Bag of Word Vectors (gwBoWV) by (Vivek Gupta, 2016) for faster and better composite document feature representation. We propose a very simple feature construction algorithm that potentially overcomes many weaknesses in current distributional vector representations and other composite document representation methods w...
متن کاملImproving Multi-Document Summarization via Text Classification
Developed so far, multi-document summarization has reached its bottleneck due to the lack of sufficient training data and diverse categories of documents. Text classification just makes up for these deficiencies. In this paper, we propose a novel summarization system called TCSum, which leverages plentiful text classification data to improve the performance of multi-document summarization. TCSu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Advanced Computer Science and Applications
سال: 2023
ISSN: ['2158-107X', '2156-5570']
DOI: https://doi.org/10.14569/ijacsa.2023.0140332